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Abstract - Today, very large amounts of information are 
available in online documents, and a growing portion of such 
information comes in the form of people's experiences and 
opinions. It would be helpful for companies, recommender 
systems, and review or editorial sites to automatically compile 
digests of such information. It has proven quite useful in such 
contexts to create summaries of people's experiences and 
opinions that consist of subjective expressions extracted from 
reviews, here we are developing an automated sentimental 
analysis program which will automatically analyze all the 
comments and give a concluded feedback. In this work an SVM 
based algorithm regression method for developing and 
normalizing evaluation rating of the educational application is 
used. A robust feature added is stop list. In the implementation 
we have used external comment files and each word in every 
comment is compared with the pre created word file according 
to naive string search algorithm this process keeps on for every 
comment in the database and for every match score evaluated 
for both aspect-level and document-level is normalized with 
comparison to the normalized list and rating is generated from 
every comment. After this aggregated results are shown in 
pictorial form. This technique deals with the sentiments and 
textual comments given by the users. 

Index Terms — aspect level method, naive string search 
algorithm regression method, rating of keywords. 

I. INTRODUTION 

Sentiment analysis aims to uncover the attitude of the user’s 
on a particular application from the textual comments. Other 
terms used to denote this research area include “opinion 
mining” and “subjectivity detection”. It uses natural language 
processing and machine learning techniques to find statistical 
and/or linguistic patterns in the text that reveal attitudes. It has 
gained popularity in recent years due to its immediate 
applicability in business environment, such as summarizing 
feedback from the product reviews, discovering collaborative 
recommendations, or assisting in election campaigns. 
Previous works focus on two important properties of text: 

1.Subjectivity - whether the style of the sentence is subjective 
or objective. 

2.Polarity - whether the author expresses positive or negative 
opinion. 

Most prior work on the specific problem of categorizing 
expressly opinionated text has focused on the binary 
distinction of positive vs. negative (Turney, 2002; Pang, Lee, 
and Vaithyanathan, 2002; Dave, Lawrence, and Pennock, 
2003; Yu and Hatzivassiloglou, 2003). But it is often helpful 
to have more information than this binary distinction 
provides, especially if one is ranking items by 
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recommendation or comparing several reviewers' opinions: 
example applications include collaborative filtering and 
deciding which conference submissions to accept. Therefore, 
in this chapter we consider generalizing to finer-grained 
scales: rather than just determine whether a review is \thumbs 
up" or not. 

We have used aspect-level method for sentimental analysis of 
textual comments. As this method gives the explored opinion 
about product .in this method firstly we decided four aspects 
related to educational apps and ratings for these aspect level 
keywords is obtained by normalizing evaluation ratings based 
on regression method and table for keywords and their 
corresponding values is created after that keywords based on 
linguistic features from input textual data is extracted and 
defined under these four aspects after that naive string search 
algorithm is used for matching and for each matched keyword 
score is aggregated. After all these aggregation is shown in 
pictorial form. 

Rest of paper is described as Ilrd section contains all previous 
techniques of sentimental analysis, Illth section describes the 
method which we have used for our work, IVth section shows 
results. 


II. RELATED WORK 

As we have purposed a method for sentimental analysis is 
based on aspect level which also related to some other basic 
work which we are going to explained here. 

A. NLP part -of-speech sentimental Analysis[4] 

In this method SAS data miner 7.1 is used for data mining as 
input data is often short, contains non- grammer sentence and 
slags.for sentimental analysis SAS sentimental analysis studio 
used which have two modes one is rule based model and 
another is statistical model. In rule based model main 
keywords like product name and features are tagged as nouns. 
After that statistical model is built for rest part of speech 
tagging and learned features from first model is imported to 
start pos tagging and these keywords automatically matched 
with corpus directory. Then adverb (ADV), negative 
adjective (NEGADJ), positive adjective (POSADJ), verbs 
(VERB) are tagged different lists prepared for less or more 
positive negative sentences is prepared after all this 
CLASSIFIER rule is used to match term or phrase for features 
, CONCEPT rule to locate related terms and DIST_n is for no 
of matches from these matches weightage to rules is given 
after that positive and negative results are calculated. 

B. Feature based heuiristic for aspect-level sentiment 
classification[ 1 ] 
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This method labels on aspect level is assigned to input text 
and scores on each aspect is aggregated and net profile of a 
product is generated on all parameters. In this method 
SentiWordNet based scheme with two different linguistic 
feature selection is used. This linguistic feature selection is 
combination of adjectives, adverbs verbs and n-grams. in this 
method algorithmic formulation is applied on both 
document-level analysis and aspect-level analysis as this 
algorithm firstly extract opinionated terms and lookup their 
scores in SentiWordNet. Use of SentiWordNet is difficult as 
a lot of decisions to taken regarding linguistic features to be 
used and weight to be given to these features so scores for 
each extracted word is obtained from SentiWordNet library. 
Then aggregation of scores gives the final results. 

C. NLP techniques [6] 

Two natural language processing techniques are used for 
sementic extraction from textual input one is linguistic 
patterns and second is extraction rules. Extraction rule is used 
to identify words in corpus we first extract noun phase then 
some difficult functionality i.e noun phrase and verb after that 
more extended functionality i.e noun phrase verb and 
preposition conjunctions etc but in linguistic patterns tagging 
is done on domain level are words related to domain is 
extracted so linguistic pattern gives more accurate results than 
extraction rules. 

D. SVM machine learning technique [12] 

Tins technique gives subjective and objective opinion about 
text i.e. positive and negative opinion so this technique is not 
much useful in all scales. 

As from all previous work aspect level or Domain level 
technique gives more accurate results so we have used this 
technique in our work 

III. METHODOLOGY 

Here we are matching certain predefined words with 
comments being mentioned by user of product. Every word or 
group of words has certain significance and grade 
accordingly in the formulation table of the data base .Every 
significant word make increase in the grade of product review 
in terms of following: 

1. Quality 

2. Usage 

3. Graphics 

4. Time to respond, etc. 

Apart from previous work in our work for sentimental 
analysis of textual data we have used (regression) techniques 
to give scores and ratings to pre extracted keywords. As these 
scores are used to show graphical results at aspect level. For 
keyword matching NAVIE STRING SEARCH algorithm is 
used in our word. 

In this we are searching for those particular words and grading 
it accordingly to the prescribed rating of that word .For that 
we need to follow certain steps which are described in 
Diagram below: 

Steps of working are explained below as shown in figure 1. 

A. INPUT 

Get the input from comment file. 


In the first stage we get input from word from both predefined 
word list and comment file. So these are inputs now ready to 
be send to the stage of string matching. 


INPUT 
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Figurel: Methodology 

B. RATING 

Here by ratings to keywords decided at aspect level is given. 
These ratings used to show sentimental analysis in graphical 
form. There are several methods to calculate ratings and 
provide them to keywords. But in our work we use 

Regression Method of Normalization of Evaluating 
Rating. 

For the rating calculation of aspect words, we mainly make 
use of semantic orientation computation based on HowNet 
proposed by Zhu[13] and is typically reflected in formula (1). 

SOsim(w)=Max(similarity(w, ti) (1) 

Where SOsim(W) denotes the rating value of w; ti standing 
for the ith word in string; similarity(w, ti) corresponds to the 
semantic similarity computation between words. We take the 
maximum similarity value among those calculated between 
target words and all baseline words as the target words' rating 
value. 

We here are implementing the algorithm of regression to find 
the rating of an education based application. The rating of the 
any aspect is defined as; we can take a regression perspective 
by assuming that the labels come from a discretization of a 
continuous function g mapping from the feature space to a 
metric space. If we choose g from a family of sufficiently 
“gradual" functions, then similar items necessarily receive 
similar labels. In particular, we consider linear, "insensitive 
SVM regression”. Here applying linear regression to classify 
documents (in a different corpus than ours) with respect to a 
three-point rating scale. We compute aggregate up to nearest 
tens of number and take it as overall factor to calculate the 
value out of which rating is evaluated. Let us say if we have 
good keyword in document 49 times and keyword bad 40 
times so we can take nearest tens of 50 .So, we have rating as 
10 of word “good” and 8 rating in “bad” so overall rating is 
subtraction of two rating .This comes out to be 2 out of 10. 

C. STRING MATCHING 

In this stage string matching is being done on comment file 
word with all words of same length from the word list. For 
sting matching NAIVE string search algorithm is used. The 
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word list here acts as predefined significant word list used to 
give rating to the product being used .The same procedure is 
being carried on all words of comment file. i.e. each and every 
word from comment file is being matched against word of 
word list. The basic principle of string matching to look out 
for words of same length in predefined wordlist. This makes 
our task easier to go for further stage of evaluation of rating 
for product. 

Here pure half of task is being completed and move to the next 
stage. 

D. GRADE EVALUATION 

Grade Evaluation: In this stage, we are to process the 
evaluation of word being matched from our grade list. Each 
word in the list has its own significance in terms of grading .In 
fact group certain specific words in specific order also 
determine specific evaluation points for rating .so all grades 
of each word and group of predefined words is computed. The 
result grade evaluated is rating by user of products in different 
fields like Product usage, Interface, Knowledge, Quality, etc 
For grade evaluation weighted formula is used. 

Making use of the ratings generated, we can calculate fP ci And 
fp ci of each character emerging in string. fP ci and foci 
respectively denote the frequencies of a character ci in the 
positive and negative words. Formulas (2) and (3) utilize the 
percentage of a character in positive/negative words to show 
its sentiment tendency. 

A 'ZAy 

P - -—- (2) 

j-i j - i 


E. RE MATCHING WITH STOP LIST 

Re matching with stop list: Whenever there is change in the 
comment file, it undergoes same procedure and stages of 
String matching and grade evaluation .This makes it dynamic 
and accurate enough to measure performance of product .In 
this way, High rating evaluation could easily be done. 

Stop list is new feature added in our work. In stop list 
irrelevant words to aspect level words are extracted from sting 
before apply grade evaluation formula and with this feature 
processing become faster and it gives good results in less 
time. So this key feature is here added for fast processing. 


F. SHOW GRAPHICAL RESULTS 

After all this process results for sentimental analysis is shown 
in graphical form. 

Here to find no of comments can be seen as: 

Score t = count t (pos.word)+(neg.word) (6) 

Where SOsim(W) denotes the polarity value of w; ti standing 
for the ith word in polarity lexicon; similarity(w, ti) 
corresponds to the semantic similarity computation between 
words. This overall count is taken by adding both positive and 
negative comments together for an application. We take the 
maximum similarity value among those calculated between 
target words and all baseline words as the target word’s rating 
value. 


IV. RESUFTS 


a^Za, 

=_._ (3) 

cj rt in 
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Results generated from whole process are given below and 
these results are shown for variation in no of comments. The 
result grade evaluated is rating by user of products in different 
fields like Product usage, Interface, Knowledge, Quality, etc 


Pci and Nci respectively denote the weights of ci as positive 
and negative characters; n and m respectively denote total 
number of unique characters in positive and negative words. 
The difference of Pci and Nci, i.e., Pci-Nci in Formula (4), 
determines the sentiment tendency of character ci. If it is a 
positive value, then this character appears more times in 
positive words and vice versa. A value close to 0 means that it 
is not a sentiment character or it is a neutral sentiment 
character. 



Figure 2: Results for all positive comments 


S cl =(P ci ~N ci ) (4) 

In this way, when expanding the polarity lexicon, average 
rating value of each character of new words is calculated 
which reflects as formula (5), where n stands for the character 
number of word w. If some characters without rating value 
appear, take the default value as zero. 

Sf =1 Scj 

SO C h arac t er (w)— (5) 

71 

In this way grade evaluation is done. Store these values to 
show graphical results. 



Figure 3: Results for approximately 2500 comments 
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Figure 4: Results for 5000 comments 

After evaluating these all results summary of time comparison 
is shown in table 1. 

Table 1 Time Comparison table 


by two systems is 0.11. This shows system with stop list is 

better than old system. 

In future work performance measures can be further 
enhanced. Sentiment analysis can be applied on different 
range of applications as our work is on educational apps.The 
main challenging aspects exist in use of other languages, 
dealing with negation expressions; produce a summary of 
opinions based on product features/attributes, complexity of 
sentence/ document, handling of implicit product features etc. 
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No. of 

comments 

Time taken without 
stop list (in secs) 

Time taken with 
stop list (in secs) 

1000 

.5.567 

.5.408 

2500 

.6.5892 

.5.8013 

5000 

.12.3474 

.11.08 


This table shows the time comparison between two 
sentimental analysis systems this shows that sentimental 
analysis system with stop list is much better because it reduces 
time for processing. As results shows aggregate time for old 
system is 1.22 approx and for new system with stop list is 1.11 
so difference between time taken by two systems is 0.11. This 
shows system with stop list is better than old system. 

VI. CONCLUSION 

Sentimental analysis is used in wide range of applications like 
classifying reviews, summarizing reviews and other real time 
applications. From our research work we conclude that aspect 
level scheme for sentimental analysis is very useful as this 
method gives detail results about application and product 
reviews based on different domains. Another researcher also 
satisfied the thing that Aspect level scheme and linguistic 
patterns approaches are very useful as these gives accuracy 
result 98.9% [3]. 

From the above work it is evident that neither classification 
model consistently outperforms the other, different types of 
features have distinct distributions. It is also found that 
different types of features and classification algorithms are 
combined in an efficient way in order to overcome their 
individual drawbacks and benefit from each other’s merits, 
and finally enhance the sentiment classification performance. 
Stop list new added feature enhance the performance of 
sentimental analysis system very accurately as results shows 
aggregate time for old system is 1.22 approx and for new 
system with stop list is 1.11 so difference between time taken 
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